14 research outputs found

    A dynamic texture based approach to recognition of facial actions and their temporal models

    Get PDF
    In this work, we propose a dynamic texture-based approach to the recognition of facial Action Units (AUs, atomic facial gestures) and their temporal models (i.e., sequences of temporal segments: neutral, onset, apex, and offset) in near-frontal-view face videos. Two approaches to modeling the dynamics and the appearance in the face region of an input video are compared: an extended version of Motion History Images and a novel method based on Nonrigid Registration using Free-Form Deformations (FFDs). The extracted motion representation is used to derive motion orientation histogram descriptors in both the spatial and temporal domain. Per AU, a combination of discriminative, frame-based GentleBoost ensemble learners and dynamic, generative Hidden Markov Models detects the presence of the AU in question and its temporal segments in an input image sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, the proposed method achieved an average event recognition accuracy of 89.2 percent for the MHI method and 94.3 percent for the FFD method. The generalization performance of the FFD method has been tested using the Cohn-Kanade database. Finally, we also explored the performance on spontaneous expressions in the Sensitive Artificial Listener data set

    RCEA: Real-time, Continuous Emotion Annotation for collecting precise mobile video ground truth labels

    Get PDF
    Collecting accurate and precise emotion ground truth labels for mobile video watching is essential for ensuring meaningful predictions. However, video-based emotion annotation techniques either rely on post-stimulus discrete self-reports, or allow real-time, continuous emotion annotations (RCEA) only for desktop settings. Following a user-centric approach, we designed an RCEA technique for mobile video watching, and validated its usability and reliability in a controlled, indoor (N=12) and later outdoor (N=20) study. Drawing on physiological measures, interaction logs, and subjective workload reports, we show that (1) RCEA is perceived to be usable for annotating emotions while mobile video watching, without increasing users' mental workload (2) the resulting time-variant annotations are comparable with intended emotion attributes of the video stimuli (classification error for valence: 8.3%; arousal: 25%). We contribute a validated annotation technique and associated annotation fusion method, that is suitable for collecting fine-grained emotion annotations while users watch mobile videos

    Fusion of facial expressions and eeg for implicit affective tagging

    No full text
    Abstract The explosion of user-generated, untagged multimedia data in recent years, generates a strong need for efficient search and retrieval of this data. The predominant method for content-based tagging is through slow, labour-intensive manual annotation. Consequently, automatic tagging is currently a subject of intensive research. However, it is clear that the process will not be fully automated in the foreseeable future. We propose to involve the user and investigate methods for implicit tagging, wherein users' responses to the interaction with the multimedia content are analysed in order to generate descriptive tags. Here, we present a multi-modal approach that analyses both facial expressions and Electroencephalography(EEG) signals for the generation of affective tags. We perform classification and regression in the valence-arousal space and present results for both feature-level and decision-level fusion. We demonstrate improvement in the results when using both modalities, suggesting the modalities contain complementary information

    Non-rigid registration using free-form deformations for recognition of facial actions and their temporal dynamics

    No full text
    In this paper we propose an appearance-based approach to recognition of facial Action Units (AUs) and their temporal segments in frontal-view face videos. Non-rigid registration using free-form deformations is used to determine motion in the face region of an input video. The extracted motion fields are then used to derive motion histogram descriptors. Per AU, a combination of ensemble learners and Hidden Markov Models detects the presence of the AU in question and its temporal segment in each frame of an input sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, an average sequence classification rate of 94.3% was achieved

    Non-rigid registration using free-form deformations for recognition of facial actions and their temporal dynamics

    Get PDF
    In this paper we propose an appearance-based approach to recognition of facial Action Units (AUs) and their temporal segments in frontal-view face videos. Non-rigid registration using free-form deformations is used to determine motion in the face region of an input video. The extracted motion fields are then used to derive motion histogram descriptors. Per AU, a combination of ensemble learners and Hidden Markov Models detects the presence of the AU in question and its temporal segment in each frame of an input sequence. When tested for recognition of all 27 lower and upper face AUs, occurring alone or in combination in 264 sequences from the MMI facial expression database, an average sequence classification rate of 94.3% was achieved

    Continuous Emotion Detection in Response to Music Videos

    No full text
    Viewers' preference for multimedia selection depends highly on their emotional experience. In this paper, we present an emotion detection method for music videos using central and peripheral nervous system physiological signals as well as multimedia content analysis. A set of 40 music clips eliciting a broad range of emotions were first selected. After extracting the one minute long emotional highlight of each video, they were shown to 32 participants while their physiological responses were recorded. Participants self-reported their felt emotions after watching each clip by means of arousal, valence, dominance, and liking ratings. The physiological signals included electroencephalogram, galvanic skin response, respiration pattern, skin temperature, electromyograms and blood volume pulse using plethysmograph. Emotional features were extracted from the signals and the multimedia content. The emotional features were used to train a linear ridge regressor to detect emotions for each participant using a leave-one-out cross-validation strategy. The performance of the personalized emotion detection is shown to be significantly superior to a random regressor

    DELEX

    No full text
    corecore